ON THE ACCURACY OF SOLVING CONFLUENT PRONY SYSTEMS* 
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Abstract. In this paper we consider several nonlinear systems of algebraic equations which can be called "Prony-type". 
These systems arise in various reconstruction problems in several branches of theoretical and applied mathematics, such as 
frequency estimation and nonlinear Fourier inversion. Consequently, the question of stability of solution with respect to errors 
in the right-hand side becomes critical for the success of any particular application. We investigate the question of "maximal 
possible accuracy" of solving Prony-type systems, putting stress on the "local" behavior which approximates situations with low 
absolute measurement error. The accuracy estimates are formulated in very simple geometric terms, shedding some light on 
the structure of the problem. Numerical tests suggest that "global" solution techniques such as Prony's algorithm and ESPRIT 
method are suboptimal when compared to this theoretical "best local" behavior. 
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1. Introduction. 

l.A. Problem definition. Consider the following system of algebraic equations: 

K 

Y, a itf = m k (i-i) 

i=l 

where Oi,£i € C are unknown parameters and the measurements {?rifc} fc= Q j are given. This "exponential 
fitting" system, or "Prony system", appears in several branches of theoretical and applied mathematics, such 
as frequency estimation, Pade approximation, array processing, statistics, interpolation, quadrature, radar 
signal detection, error correction codes, and many more. The literature on this subject is huge (for instance, 
the bibliography on Prony's method from [3] is some 50+ pages long). Our interest in this system (and 
other, more general systems of this kind, to be specified below) is motivated by its central role in Algebraic 
Sampling - a recent approach to reconstruction of non-linear parametric models from measurements. There, 
it arises as the problem of reconstructing a signal modeled by a linear combination of Dirac ^-distributions: 

K 

f(x)=^ai8[x-&), a^.et (1.2) 
i=i 

from the measurements given by the power moments 

m fe (/)= f / x k f(x)dx. (1.3) 
Jo 

While the above problem may be considered mainly of theoretical interest, it is actually one of the 
most basic ones in Algebraic Sampling. On one hand, if s(x) is a piecewise-constant signal with jump 
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discontinuities at the locations £1, . .. ,£jc, then s'(x) = f(x) as in (|1.2p . Thus, the "signal" /(x) essentially 
captures the non-smooth nature of s(x). On the other hand, the moments (|1.3|) are convenient to consider 
because of the respective simplicity of the arising algebraic equations, while other types of measurements 
(e.g. Fourier coefficients) may be recast into moments after change of variables. 

An important generalization of the Prony system, which is of great interest to us, arises when the simple 
model (|1.2II is extended to include higher-order derivatives (see [8| [46] for examples of such constructions) : 

K h-l 

/(^EE^^-^ ^,Oeffi (1.4) 

i=l j=0 

where 5^ is the j-th derivative of the Dirac delta (in the sense of distributions). 

From now on, we denote the number of unknown coefficients dij by C == 2^=1 ^> ano - the overall 
number of unknown parameters by R = f C + JC. Taking moments of f (x) in (|1.4p . we arrive^ at the 
following "confluent Prony" system: 

K h-l 

^ ^ = ™fc ay,^i,m fe € C (1.5) 

8=1 j=0 

where the Pochhammer symbol (i)j denotes the falling factorial 

= i{i - 1) ■ . . . ■ (i - j + 1), i e M, j e N 

and the expression (fe)j£j 3 is defined to be zero for k > j. 

The Prony-type systems appear in various recent reconstruction methods of signals with discontinuities 
-see[71lill[Tni[ni[Il[Il[2ni[2Il[21[M[lHl[Sl[3n]- In particular, Finite Rate of Innovation (FRI) 
techniques [191 [5T1 14"5] have spawned a rather extensive literature (see e.g. a recent addition [44 ). Usually, 
the represent "location" parameters of the problem, such as discontinuity locations or complex frequencies 
£j = e M * . These variables enter the equations in a nonlinear way, and we call them "nodes". The coefficients 
CLij , on the other hand, enter the equations linearly, and we call them "magnitudes". 

While Algebraic Sampling provides exact reconstruction for noise-free data in many cases mentioned 
above, a critical issue remains - namely, stability, or accuracy of solution. Stable solution of Prony-type 
systems is generally considered to be a difficult problem, and in recent years many algorithms have been 
devised for this task (e.g. [HI [23 I2H S3 EH ESI EH H21 US). Perhaps the simplest version of the stability 
problem can be formulated as follows (cf. Definition 13.11 Definition 14.11 and Subsection ll.D|) . 

Assume that the measurements {mk} k=0 «_ 1 are known with some error: m k +e k - Given an estimate 

def ~ 

e = maxfc \£k\, how large can the error in the reconstructed model parameters (i.e. |A£j| = |£ 3 - — £j\ and 

def 

A'/ . = \ai.j — Oi j\) be in the worst case in terms of e, number of measurements S and the true parameters 

U\. !•{«>,!•• 

In more detail, our ultimate goal may be described as follows: 

1. determine the qualitative dependence of the accuracy on the values of the parameters; 

2. quantify this dependence as precisely as possible; 



1 Strictly speaking, this will result in a "real" confluent Prony system. 
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3. determine how (and if at all) increasing the number of measurements (i.e. oversampling) improves 
accuracy. 

l.B. Related work. Matching the ubiquity of Prony-type systems is the impressive body of literature 
devoted to both designing methods of solution and analyzing the accuracy/robustness of these methods, see 
references above. Although there appears to be no simple answer to the above question of "maximal possible 
accuracy", several important results in this direction are available in the literature, which we now briefly 
discuss. 

Methods of solution can be roughly divided into three categories (see e.g. [JT],!!^! Section 4]): direct 
nonlinear minimization (nonlinear least squares), recurrence-based methods (such as original Prony's method 
- see Section [5]) and subspace methods (such as Pisarenko's method, MUSIC, ESPRIT, matrix pencils - see 
e.g. M)- 

In the framework of statistical signal estimation [57] , the subspace methods are known to be more efficient 
and robust to noise, mainly due to the fact that the noise is assumed to have certain statistical properties. 
The confluent Prony system (jl.5[) is also known as "polynomial amplitude complex exponential" (PACE) 
model. A standard measure of estimator performance is Cramer-Rao (and related) lower bounds (CRB). 
These have been recently established for the PACE model in [5] (see also related results for FRI models [15J). 
Furthermore, it has been demonstrated that the performance of the generalized ESPRIT algorithm ([HE] 
and Subsection I5.B[) is close to the optimal CRB, therefore we consider it to represent the state of the art in 
the subspace methods. 

We do not assume any particular statistical model or other structure for either the error terms Sk or the 
estimation algorithm (such as white noise or unbiasedness) . Therefore, the CRB and related lower bounds 
cannot provide the full answer to the stability problem as is. Still, it turns out that the stability bounds 
developed in this paper resemble the CRB as established in |5), see Subsection 15. Al below for details. 

Recent papers of Tasche et al. [34, 36] contain some uniform error bounds for solving Prony systems. In 
particular, the authors develop the so-called Approximate Prony method, analyze its worst-case error and 
numerically compare it with the ESPRIT method (showing similar performance). Although they consider 
the non-confluent version of the Prony system (jl.lj) and analyze only the error in recovering the magnitudes 
a,j , we believe these results to be an important step towards answering the stability problem as posed above. 
See Subsection 15 . CI below for details. 

Very recently, Candes et al. [TS] investigated stable solution of Prony systems by total variation mini- 
mization under assumptions of minimal node separation, in the context of super-resolution. 

Considering all the above, we believe that a full answer to our somewhat rigid l°° formulation of the 
stability problem may contribute to the understanding of limitations of using Prony systems and methods 
both in signal processing applications and in function approximation, in particular compressed sensing, 
nonlinear Fourier inversion, Finite Rate of Innovation techniques and related problems. 

l.C. Notation. In the sequel we use the infinity norm distance 

\/x, ygC: dist (x, y) = max \xi — y%\, 

l<i<n 

and denote by B (a, e) the e-ball around a point a 6 C" in this norm. 
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l.D. Summary of results. In Section [3] we define "best possible point-wise accuracy" as follows. We 
consider the "Prony map" Vs '■ C R — > C s which associates to any parameter vector x = {{fly}, {&}} € C R 
its corresponding measurement vector y — (mo, . . . , ms—i) G C s (where the are given by (|1.5[l ). 

Now if instead of y we are given a noisy y £ £>(y,e), then this y can correspond to any parameter 
vector x £ C R for which Vs (x) E B (y,e). Therefore we define the best possible accuracy at a point x to 
be equal to the maximal (over all y) spread of the preimage of this B (y,e), that is (see Definition 13. 1[) 

sup i dieanVg 1 (B (y, e)) . 

y£B(y,e) 2 

We then simplify the setting by assuming that the number of measurements S equals the number of 
unknowns R, and looking at the (local) linear approximation to the Prony map Vs- Then the solution error 
at some (non-critical) point in the parameter space can be estimated by the local Lipschitz constant of the 
(regular) inverse map Vg . We derive such simple estimates in Section 2J and compare them to the "global" 
accuracy of the original Prony method (derived for completeness in Section [5]). 

Our main result (Theorem 14. 5p can be summarized as follows (all statements are valid for small e): 

1 . The stability of recovering a node £j depends on the separation of the nodes and is inversely pro- 
portional to the magnitude of the highest coefficient corresponding to this node (|et<.i 4 _i|), and does 
not depend on any other magnitude. 

2. For 1 < j < 1^ — 1, the stability of recovering a magnitude <Zjj depends on the separation of the 
nodes, is proportional to 1 + I"'-?- 1 ! , and does not depend on any other magnitude. Note that in 

| o*, ; ^ — i | 

fact every magnitude influences only the next highest magnitude corresponding to the same node. 

3. The stability of recovering the lowest magnitudes a, ( o is the same for all nodes and it depends only 
on the separation of the nodes. 

The separation of the nodes is specified in terms of norms of inverse confluent Vandermonde matrices on the 
nodes, which is roughly of the same order as some finite power of Ili<i<j<A; \€j — £il • 

Our numerical experiments (Section \§§ confirm the above theoretical estimates. We also test the per- 
formance of two well-known solution methods - namely the recurrence-based Prony method (Section [5]) and 
the generalized ESPRIT fSubsection !5.Bp - in the same setting as above (i.e. high SNR). The results suggest 
that: 

1. The recurrence-based global Prony method does not achieve the above theoretical limits, and so it 
is not optimal even in the case of small data perturbations. 

2. The subspace methods (in particular the ESPRIT algorithm) behave better than the Prony method 
but still they are not optimal for small perturbations and small sample size. 

The "Prony map" approach can in principle be generalized to obtain both global accuracy bounds as well as 
study effects of oversampling - by considering the case S > R and taking into account second-order terms in 
the Taylor expansion of Vs- We discuss these directions in Section [7] 

1. E. Acknowledgments. We are grateful to the two anonymous referees, whose comments, suggestions 
and references were very helpful. 

2. The Prony method. In this section we describe the most basic solution method for the system 
(|1.5p . which is in fact a slight generalization of the (historically earliest) method due to Prony [37J. By 
factorizing the so-called "data matrix", one immediately obtains necessary and sufficient conditions for a 
unique solution, as well as an estimate of the numerical stability of the method. 
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Most of the results of Section [5] are not new and are scattered throughout the literature. Nevertheless, 
we believe that our presentation can be useful for further study of the various singular situations, such as 
collision of two nodes. 

2. A. The description of the method. The non-trivial part is the recovery of the nodes £ 3 -. Note 
that the case of a-priori known nodes has been extensively treated in the literature (see e.g. [TJ [35] for the 
most recent results). Using the framework of finite difference calculus, one can easily prove the following 
result (see Theorem 2.8]). 

Proposition 2.1. Let the sequence {m^} be given by (| 1 . 5[) . Then this sequence satisfies the recurrence 
relation ( of length at most C + 1) 

m(E-^W} = o 

where E is the forward shift operator in k and I is the identity operator. 

COROLLARY 2.2. For all k £ N we have the recurrence relation X^'=o Qj m k+j — where qo, qi, . . . , qc 
are the coefficients of the polynomial q(x) = YliLi ( x ~ ^i)'* • 

This suggests the following reconstruction procedurqj. 



Algorithm 1 The Prony method 



Let there be given {mk}^l 1 (where C — Yli=i h)- 

1. Solve the linear system (here we set qc = 1 for normalization) 



I m 
mi 



\m C - 



iii i 

mi 



m c 



mc-i \ / go \ 

mc qi 



m 2 c-2/ \qc-lj 



( mo \ 

mc+i 



\m 2 c-i/ 



(2.1) 



for the unknown coefficients qo 
Find all the roots of q(x) — x c 



. . .,q C -x- 

+ E i= o ii x 



These roots, with appropriate multiplicities, are the 



unknowns . ,£jc (use e.g. arithmetic means to estimate multiple roots which are scattered by 
the noise into clusters). 

Substitute the recovered £j's back into the original equations p.5p . Solve the resulting overdeter- 
mined linear system (C unknowns and 2C equations) with respect to the magnitudes {a^ } by least 
squares method. 



Several comments are in order. 

1 . The number of measurements used in step [T] equals 2C which can be greater than the number of 
unknowns R = C+IC (equality for order zero Prony system). If more measurements are available, the 
linear system (|2.ip can be modified in a straightforward way to be over determined, and subsequently 
solved by, say, the least squares method. 

2. The linear system for the magnitudes has a special "Vandermonde"-like structure (see below), and 
so certain efficient algorithms can be used to solve it (e.g. [16, 29J). 

The remainder of this section is organized as follows. The Hankel matrix Mc is shown to factor into the 

2 Equivalent derivation of the method is based on Pade approximation to the function I(z) = ^ZfcLo m k zk ~ see I37| and, 
for instance, |39| . 
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product of a generalized "Vandermonde-type" matrix which depends only on the nodes £j, with a upper 
triangular matrix depending only on the amplitudes ciij. We also write down explicitly the linear system 
for the ciij (see step [3] in Algorithm [T] above) . These calculations lead to simple non-degeneracy conditions 
and stability estimates for the Prony method. 

2.B. Factorization of the data matrix. Let us start by recalling a well-known type of matrices. 

Definition 2.3. For every j = 1, ...,/C and k € N let the symbol Uj^ denote the following 1 x lj row 
vector 



def 



(2.2) 



Definition 2.4. Let U = J7(£i,/i, . . . ,£k;,£jc) denote the matrix 



U : 



"1,0 

"1,1 



"2,0 
"2,1 



"l.C-l "2,C-1 



Uk,0 
"/C,l 



(2.3) 



This matrix is called the "confluent Vandermonde" ( |16U22) ) matrix. It has been long known in numerical 
analysis due to its central role in Hermite polynomial interpolation. Its determinant is ( |40l p. 30]) 



K lu-1 



detu= n fe-e^ii n 



(2.4) 



l<i<j<K 



fi—l u—1 



It is straightforward to see that the matrix U defines the linear system for the jump magnitudes a{ t j. 
Proposition 2.5. Let a be the column vector containing all the magnitudes {dij}, i.e. 

a — [ fl l,0: ■ • ■ j OX,ii-l} a 2,0, • ■ • 7 a 2,i 2 -li • ■ • i a Kfi, CIK,Ik-11 



and m = [mo, . . . , mc-i) T ■ Then we have 



U(£i,h,...,£ic,lK.)a = m. 



(2.5) 



It is known that every Hankel matrix H admits a factorization H = UDU T , where U is given by (|2.3p 
and D is a block diagonal matrix - see [T7]. Using different notations, such a factorization is proved in [1J 
Proposition III. 7] for the Hankel matrix Mc- 

Lemma 2.6. For the system (|1.5p . the matrix Mc admits the following factorization: 



M c = UBU T 



(2.6) 



where U = ?7(£i, h, ■ ■ ■ , 6e, Ik) is the confluent Vandermonde matrix (|2.3p and B is the C x C block diagonal 
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matrix B = diag{_Bi, . . . , B/c} with each block of size ij x 1$ given by 



an 



Oil 



(VK«i"l 



(U:5)«i 









(2.7) 



In other words, Bi is a "flipped" upper triangular matrix whose j-th anti-diagonal equals to 



i (!) 



(A) i 



for j = 0, ...,k - 1. 

The formula (|2.6I) is useful because it separates the jump locations from the magnitudes {cij,j}, 
simplifying the analysis considerably. 

Theorem 2.7. 27ie system (jTTSJ) /or fe = 0, 1, . . . , 2C /ias a unique solution if and only if all the 's 
are pairwise different and all the {di^-ij's (just the highest coefficients) are nonzero. 

Proof. Existence of a unique solution to the system (|2.1[) is equivalent to the non-degeneracy of Mc = 
UBU T . Furthermore, the system for the jump magnitudes is given by (|2.5| . Therefore, existence of a unique 
solution to p.5p is equivalent to the conditions det U ^ and det B ^ 0. The proof is completed by (|2.4[) 
and (E21). □ 

2.C. Stability estimates. The stability of the Prony method can be estimated by the condition 
numbers of the matrices B and U. In particular, we have the following well-known result (e.g. [473) from 
numerical linear algebra. 

Lemma 2.8. Consider the linear system Ax = b and let x Q be the exact solution. Let this system be 
perturbed: 



{A + AA)x = b+Ab 

and let io + Ai denote the exact solution of this perturbed system. Denote Sx = tt^t , SA 
and the condition number k — \\A\\\\A~ 1 \\ for some vector norm || • || and the induced matrix norm. Then 



\\AA\ 



,6b 



IIA&II 



Sx < 



1 — « • 8 A 



(5 A + 5b) . 



(2.8) 



Now we can easily estimate the stability of the Prony method (compare with similar estimates in [H eq. 
(19)]). 

Corollary 2.9. Let the measurements {m^} be given with an error bounded by e. Denote u = k(U), b = 
k(B). Assume that < S for all i = 1, . . . ,JC. Then the Prony method recovers the parameters {£j,ca,j} 
with the following accuracy as e — > 0: 

|A&| ~ (u 2 be)^ +o(e^) 
lAoijI ~ C^u^be) 1 ^^ +L.O.T. 



where C (S) is a constant depending on the number S. 



Proof. Using the factorization of Lemma [2.61 we obtain that n(Mc) < u 2 b. Therefore, according to 
(|2.8|) the coefhcient vector q — (qo, . . . , qc-i) is recovered with the accuracy 



u 2 bs 9 , 

< z 5T" ~ u 2 be + 0(e 2 ). 

1 — u z be 

The parameters are the roots of the polynomial with coefhcient vector q, with multiplicities 

Jjc- Therefore, by the general theory of stability of polynomial roots (see e.g. [37]) it is known that 
A£j ~ (5qr) ^ . The first part of the claim is thus proved. 

Now consider the linear system (|2.5[) for recovering the jump magnitudes. Note that the matrix U is 
known only approximately. Again, by (12. 8p we have 

s °~T^§w isu+Sm) (2 - 9) 

i 

Assuming that < S, it is easy to see that SU ~ C (S) (w 2 6e) max J ! j . Plugging this value into ()2.9I1 we get 
the desired result. □ 

Inverses of confluent Vandermonde matrices and their condition numbers are extensively studied in 
numerical linear algebra (e.g. |121 [T3I 122"] fj In general, n{U) will grow exponentially with JC and will also 
depend on the "node separation" Yl&j I £7 ~ 0l — 1 * As f° r we are n °t aware of a general formula except 

for the simplest casejf). 

Finally, notice that the stability estimates of Corollarv l2. 91 suggest that when the Prony method is used, 
the parameters of the problem are "coupled" to each other, in the sense that the accuracy of recovering 
either a node & or a magnitude a it j will depend on the values of all the parameters at once. This undesired 



3 In particular, the paper [221 Theorem 3] contains the following estimate for the norm of {U 1, . . . , fjc, 1)} 1 when the 
nodes are arbitrary complex numbers: 



where 




4 The following are estimates of the spectral condition numbers. 

• For the standard Prony system we have 

minj \a jt0 \ 

• For multiplicity 1 confluent system, assuming a,j,i ^ and denoting fij = brute force calculation gives 



k(B) = 



max, . 

\ 






minj 

\ 
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behavior is confirmed by our numerical experiments in Section [SJ 



3. Measurement set and the Prony map. Assume that the number of measurements is S > R 
(where R is the overall number of parameters in the confluent Prony system). Then we define Mr^s to be 
the se10 of all possible exact measurements, i.e. 

Mr,s = I (mo, mi, . . . , ms-i) ■ m k = Y, Y, a i,]( k )j£i 3 , a %,j 6 C, (j e C > C C s . 

[ i=l j=0 J 

This A4r.s is the image of C R under the 'Prony map" Vs '■ C R —> C s defined as 

K h-i 

(K},fc}) = K,mi,...,m s _i): '"/. Y Y "'-r 1 ^.^- "'• ( 3 - 1 ) 

i=l j=0 

Now let x = {{a,j}, G C fl be an unknown parameter vector and y = Vs ( x ) € -M_r,s its corre- 

sponding exact measurement vector. The absolute error in each measurement is bounded from above by e, 
therefore the actual measurement satisfies y £ B (y,e). Now consider the set 

Ty i£ =MR, s nB(y,e) 

of all possible noise-free measurements corresponding to the given noisy one y. Any algorithm which receives 
this y as input will therefore produce worst-case error which is at least 

idiamT^ 1 ^) 

where Vg 1 denotes the full preimage set. 

This prompts us to make the following definition. 

Definition 3.1. Assign to each one of the parameters {a.y }, {^} a unique index 1 < p < R. The best 
possible point-wise accuracy of solving the noisy confluent Prony system (|1.5|) with each noise component 
bounded above by e at the point x = ({a^}, & C R with respect to the parameter p is defined to be 

d f 1 

ACC(x,e, P )= J sup -diam p rs(M RtS nB(y,e)) 

yeB(V s {x),e) Z 

where diam p A is the diameter of the set A along the dimension p. 

Obviously ACC(x,e) will depend on the point x G C R in a nontrivial way because the chart Vs 
is nonlinear. Calculation of the function ACC may be considered as one possible answer to the stability 
problem posed in the Introduction. 

4. Local accuracy. Having given the general definition of accuracy, in the remainder of this paper we 
restrict ourselves to the "local" setting in the following sense: we assume that e is small enough so that the 
set Mr,s can be approximated by the linear part of the Prony map, and furthermore we take S = R so 
that the preimage will be given by the usual inverse function. For such an analysis to be valid, it should be 

5 Formally, Mr^s is a projection of the complex algebraic variety defined by the set of the S confluent Prony equations onto 
the corresponding S coordinate axes. If all parameters are real-valued, this is a semialgebraic set. 
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done at non-critical points of Vs so that this map is locally invertible. By definition, the point a; is a critical 
point of Vs if the Jacobian determinant of Vs vanishes at x. 

To summarize, let us give the following definition of the local accuracy which is nothing more than the 
first-order Taylor approximation to the inverse function Af = Vg 1 at a regular point of Vs- 

Definition 4.1. Assume S — R. Let x — ({«%}, {£»}) € C R be a regular point of Vs and assume 
e to be small enough so that that the inverse function Af = Vg 1 exists in e -neighborhood of y = Vs (x). 
Assign, as before, to each one of the parameters {a^}, a unique index 1 < p < R. The best possible local 
point-wise accuracy of solving the noisy confluent Prony system (|1.5p with each noise component bounded 
above by e at the point x with respect to the parameter p is 



ACCloc (x,e,p) = sup 

yeB(y.e) 



UMv) (y - y)l P 



where (y) is the Jacobian of Af at the point y and [v] is the p-th component of the vector v. 

In Theorem 14.51 below we estimate the function ACCloc- The key technical tool is the following fac- 
torization of the Jacobian of Vs which separates the nonlinear part depending on the nodes from the 
linear part which depends on the magnitudes {a,ij}. 

Lemma 4.2. Let x = ({%■}, {&}) € C R . Then 



J-p s (x) = U(ti,h + l ) ...,£ K ,l K , + l)-dia,g{D 1) ...,D K ,} 



(4.1) 



where U(. . . ) is the confluent Vandermonde matrix (|2.3p . and Di is the (U + 1) x (Zj + 1) block 



D, 



def 



1 
1 







••• ai , u -x. 



(4.2) 



Proof We have by (jXTj) 

da l j 
dm k 



3=0 3=1 



The rest of the proof is just a straightforward calculation. □ 

COROLLARY 4.3. x = ({dy}, {£j}) G is a critical point ofVs if and only if at least one of the 
following conditions is satisfied: 

1. £j = £j for any pair of indices i =/= j . 

2. diii-i = for any 1 < i < K.. 



COROLLARY 4.4. Letx € C be a regular point ofVs- Then the Jacobian matrix of the inverse function 
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Af = V s 1 at y = Vs{x) is equal to 



n- I \ X t < M- 1 ^(°10,---,ai,Ji-l)^l>---) a /C,0,---,a/C,lx:-l,^/c) 

JM{y) = {Jvs {x)\ = j- r 

d{m ,...,m R - 1 ) 

= diag{D-\ . . . , D K X } ■ U-\l- u h + 1, i/c + 1) 



where 



1 ••• 

l ••• f-iVi-i " " 



I lUj-1 "i,0 



0- 



(4.3) 



Now we are ready to formulate and prove our local stability result. 

Theorem 4.5. Assume S — R. Let x — {{dij}, {&}) € C" be a regular point ofVs and assume e to be 
small enough so that that the inverse function M — Vg 1 exists in e -neighborhood of y = Vs (x). 

Then there exists a positive constant C\ depending only on £i, . . . ,£jc an d h,---,l/c such that for all 
i = 1, . . . ,/C 



ACClqc {x, e, dij) 



Cie 



ACClqc {x,e,£i) = C\e- 



1 



\ai,k-i\ 



.1=0 



Proof. Express the Jacobian matrix Jj^{y) as 



JAv) 



'10 



t 1 t 1 



where 



'n0 



del 



def 



daij daij 
dmo dmi 

9mo 9m i 



drns 



9ms_ 



Let y = (mo + Ato , . . . , mj_i + Ams_i) where each |Amfe| < e. Denote by || • ||i the ii vector norm, i.e. 
if v = (vi) is an n-vector then =' Y^i=i \ V A- Then 

\Jx (y) (y - y)] aij 
[Jn (y) (y - y)] Cl 



k=0 
P-l 



< £ S 



fc=0 
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<9m fe 



U 111; 



By Corollary 14. 41 the matrix is the product of the block diagonal matrix D* = f diag{D 1 1 , . . . , -D^ 1 } with 
the matrix U* = h + 1, . ■ ■ , £/c, Ik, + 1)) -1 - Therefore, sy and ti are the products of the corresponding 

, . ^ TT* — (n,, ,\ TV, 



rows of D~ x with U* . Let Dr 1 = (c£*\) and U* = (u k ,i). Then: 



fe=i 



•«iii=EE4K* -EK-SiEk* 



Z=l fe=l 



and likewise 



U+i 

id) 



l^lli<EKiwlEK^i 



i=l fc=l 

Let || • ||oo denote the "maximal row sum" matrix norm - i.e. for any n x n matrix C — (cy) we have 
||C||oo = maxj=i,...,„ J2]=i 

Denote C\ = f ||C/*||oo. Then substitute for d^fl the actual entries of D^ 1 from (|4.3[) into the above and 
get the desired result. □ 

5. Comparison with known results. 

5. A. CRB for PACE model. The confluent Prony system (|1.5[) is equivalent to the PACE model 
[H[S]. The Cramer- Rao bound (CRB) (which gives a lower bound for the variance of any unbiased estimator) 
of the PACE model in colored Gaussian noise is as follows (note that the original expressions have been 
appropriately modified to match the notations of this paper). 

Theorem 5.1 ([5, Proposition III. 1] ) . Let the noise variance be a 2 , t/ier|§ 

2 

CRB } = C2 — 2 2 ' 

161 \<H,h-l\ 

CRB{a lfl } = C 3 a 2 , 



CRB{a hJ } = do- 2 C 5 



H,j-1 



2 



+ c 6 m{^^} + i\ j = i,2 ) ...,i i -i, 



where C2, . . . , Cq are constants depending on the configuration of the nodes while in addition Cn,C$, Cq 

depend on the index j . 

As mentioned in Subsection II. Bl there exist several essential differences between our setting and the 
statistical signal estimation framework, in particular: 

1. no a-priori statistical model of the noise is available; 

2. no assumptions on the reconstruction algorithm (estimator) such as unbiasedness are made; 

3. measure of performance is the worst-case error rather than estimator variance. 

The expressions for the CRB in Theorem 15.11 are very similar to the local point- wise accuracy bounds of 
Theorem 14. 5 1 The reason for such similarity is not a-priori clear (although it could be partially attributed to 
the fact that both methods require calculation of the partial derivatives of the measurements with respect 
to the parameters), and it certainly prompts for further investigation. 



Here 5H(-) denotes the real part. 
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5.B. ESPRIT method. The ESPRIT algorithm is one of the best performing subspace methods 
for estimating parameters of the Prony systems with white Gaussian noise. Originally developed in the 
context of frequency estimation [13] Section 4.7], it has been generalized to the full PACE model pQ, and its 
performance has been shown to approach the CRB in the case of high SNR and infinite observation length. 

In essence, the ESPRIT (and other subspace methods) relies on the following observations: 

1 . The range (column space) of both the data matrix Mc (|2.1j) and the confluent Vandermonde matrix 
U (|2.3[) are the same (follows directly from (|2.6|) ^) ; 

2. the matrix U has the so-called rotational invariance property ([!]): 

& = U^J 

where IP" denotes U without the first row, C/j. denotes U without the last row, and J is a block 
diagonal matrix whose i-th block is the Zj x Zj Jordan block with the number £j on the diagonal. 
Suppose we knew U, then the matrix J could be found by 

J = Ujrf 

(where # denotes the Moore-Penrose pseudo-inverse) and then the nodes ^ could be recovered as the 
eigenvalues of J. 

Unfortunately, U is unknown in advance, but suppose we had at our disposal a matrix W whose column 
space was identical to that of U. In that case, we would have W — UG for an invertible G, and consequently 

W r = Wi® 

where 

$ = G^JG 

which means that the eigenvalues of $ are also Such a matrix W can be obtained for example from 

the singular value decomposition (SVD) of the data matrix/ covariance matrix. To summarize, the ESPRIT 
method for estimating as used in our experiments below, is as follows. 

Algorithm 2 ESPRIT method for recovering the nodes 

Let Ms be a rectangular n x Z Hankel matrix built from the measurements. 

1. Compute the SVD M s = WT,V T . 

2. Calculate $ = WfW 1 '. 

3. Set to be the eigenvalues of $ with appropriate multiplicities (use e.g. arithmetic means to 
estimate multiple nodes which are scattered by the noise). 



Note that the dimensions n, Z are not fixed a-priori, but in [6| it is shown that taking n = 21 or Z = 2n 
results in optimal performance for non-confluent Prony system 

Since the performance of the ESPRIT method is close to the CRB which, in turn, resembles our local 
bounds, we regard the ESPRIT as the best candidate among the "global" solution methods of the confluent 
Prony system. It should be noted, however, that the analysis of ESPRIT as presented in [6j suggests a 
relatively complicated dependence of the estimator performance on the model parameters for small number 
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of measurements S. 



5.C. Approximate Prony method. In [35] the authors develop the Approximate Prony method 
for solving the system (|l.ip (restricting £j to be of unit length), and analyze its performance for small 
measurement errors. In more detail, the model is defined as 

M 

h (x) = cj e' fjX iel, cj e C, fj e (-7T, 7r) . 

The measurements are given with errors 

h(k) = h(k) + e k , k = 0,...,2N 

where the number of measurements N satisfies N > 2AI + 1. Finally, the coefficients Cj are assumed to be 
large with respect to the noise level, i.e. 

|e fc | <ei« \cA. 



The proposed solution method is as follows. 



Algorithm 3 Approximate Prony method. 

1. Build the Hankel matrix H G C 2N ~ L ' L from the measurements where L is an upper bound on the 
number of nodes. Compute singular value decomposition of H , and take the smallest nonzero singular 
value and its singular vector v = (vi) . Finally, compute the roots of the polynomial p (z) = Ylt=o Viz 1 . 
These are the approximations of {fj} . 

2. Find {cj} by solving an overdetermined Vandermonde linear system. 



The stability analysis of the APM is performed only for the step [2] above, assuming that the frequencies 
{fj} have been recovered with high accuracy. [361 Theorem 5.2] gives the following estimate: 



Vnm 



fj-fj 



max \hf.\ + max |A/ifc| . (5-1) 

k k 



While missing explicit analysis of step [T] above (however, the actual numerical accuracy of this step was 
shown in [34] to be comparable to the performance of the ESPRIT method) and dealing with single poles 
only, these results may provide an important insight as to the dependence of the accuracy on the number of 
measurements N, as well as to the applicability of the Vandermonde inversion for recovering the magnitudes 
(the errors in fact increase with N\) In addition, the authors notice that the accurate recovery of the 
magnitudes depends greatly on a sufficient accuracy of recovering the nodes, and this fact is also reflected 
in our numerical experiments (Section 

6. Numerical experiments. In our numerical experiments we had two distinct goals: 

1. Numerically investigate the "best possible local accuracy" of inverting (| 1 . 5[) as a function of the 
various parameters of the problem, and compare the results with the predictions of Theorem 14.51 

2. Ascertain whether there exist some regular patterns in the behavior of the global solution methods 
(Prony and ESPRIT) in a similar "local" setting, and compare their performance to the optimal one. 
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6. A. Experimental setup. 

1. Given JC, d, choose the jumps £1, . • • ,£/c € [0, 1] and the magnitudes ai, , ■ • ■ , a/e,d-i € [ — lj 1]- 

2. Change one or more of the parameters according to a particular experiment. 

3. Calculate the perturbed moments rhk = mk + £k where is given by (jl.5p and -C 1 (on the 
order of 10 -10 ) are randomly chosen. 

4. Invert (| 1 - 5[) with the right hand side given by rhk by one of the three methods: 

(a) Nonlinear least squares minimization (using MATLAB's lsqnonlin routine) with the initial 
guess being very close to the true parameter values. This is our simulation of the "local" setting. 

(b) Global Prony method - Algorithm [TJ 

(c) ESPRIT method - Algorithm H 



5. Calculate the absolute errors |A£j 



and |Aoi j| = 



In all the experiments we took JC — 2. All solution methods were applied to the same moment sequence 
{mk}. The number of measurements is the minimal necessary for exact inversion, namely R for least squares 
and 2C both for Prony and ESPRIT. 

6.B. Results. 

6.B.I. Changing the highest coefficient. In the first set of experiments, we checked how the recon- 
struction errors |A£j| , |Aoj A depend on the magnitude of the highest coefficient |eti ^ i j. The results are 

presented in Figure 16.11 on page [TBI (a-c) . 

For both least squares and ESPRIT (but not for Prony), the inverse proportionality |A£j| ~ a * n is 
seen in Figure |6~TI on page [TBI fa), (c), matching the theoretical predictions of Theorem 14.51 

For LS and ESPRIT, the errors |Aaj,j| seem to be unaffected by the increase in loi,^— 1|. This can 
be explained very well by the formula lAo^] ~ 1 + ^ so that indeed |Aajj| should remain close to 

constant as |a< j 4 — 1| oo. 

The Prony method's performance with respect to the recovery of the magnitudes actually degrades with 

the increase in l***,^ 1 1 . Although both Prony and ESPRIT use the same method for the recovery of the 

magnitudes, it appears that the initial error in recovering the nodes, which is significantly smaller in ESPRIT 
(see Subsection 16 ,B ,3l below) . influences this step greatly - in accordance with the predictions of [36, 34 (sec 
also discussion in Subsection I5.C[) . 

In addition, the Prony method fails to separate recovery of a node and its magnitudes (say A£i, Aaij) 
from the highest magnitude associated with another node (e.g. |<Z2,i 2 — 1 1) - these results are not shown for 
saving space. 

6.B.2. Changing coefficient other than the highest. In the second set of experiments, we changed 
the magnitude of some coefficient other than the highest, i.e. aij for j < li — 1. The results are presented 
in Figure [5TTI on page[T|5] (d-f). 

For the least squares method, the dependence of |Aa*j| on the "previous" magnitude |a;j_i| for j ^ is 
consistent with the formula |Aajj| ~ 1+ " sucn a behavior should be visible when |a$ )3 -_i| ^> |oj,i 4 -i|, 

as can indeed be noticed in Subfigure l6.1dl In addition, the other magnitudes and the jumps are unaffected, 
as predicted. 

On the contrary, neither Prony nor ESPRIT succeed in confining the influence of |aij_i| only to the 
recovery of the next magnitude |Aaij| . In particular, |A£i| increases with |<Xi,o| in both of them. The error 
in all the magnitudes grows with |cti,o|j as opposed to the least squares where only |Aai,i| is increased. 
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Accuracy of solving confluent Prony system 
Degree=2, K=2, log |e|=-1 0.000000 
m eth o d = Le ast sq u ares 



Accuracy of solving confluent Prony system 
Degree=2, K=2, log |e|=-1 0.000000 
method=Prony 





(a) Least squares 



(b) Prony 



Accuracy of solving confluent Prony system 
Degree=2, K=2, log |e|=- 10.000000 
method- ESPRIT 




Accuracy of solving confluent Prony system 
Degrees, K=2, log |e|=-1 0.000000 
method=Least squares 




(c) ESPRIT 



(d) Least squares. Note the growth of |Aai,i| 



Accuracy of solving confluent Prony system 
Degree=1, K=2, log |e|=-1 0.000000 
method=Prony 



Accuracy of solving confluent Prony system 
Degrees , K=2, log |e|=-f 0.000000 
method-ESPRIT 





(e) Prony 
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(f) ESPRIT 



Figure 6.1: (a-c): Dependence of the reconstruction error on the magnitude of the highest coefficient, degree = 2. 
(d-f): Dependence of the reconstruction error on the magnitude of the "previous" coefficient, degree = 1. 



(a) Least squares 



(b) Prony 



(c) ESPRIT 



Figure 6.2: Reconstruction error as e — > 0, degree = 2. 




6.B.3. Dependence on the measurement error. In the next experiment, we kept all the parameters 
constant and changed the magnitude of the error max^ Sk ■ The results are presented in Figure 16.21 on page 
[171 The ESPRIT performs slightly better than Prony, but both of them are worse than the optimal least 
squares. Note however that the asymptotic error (the slope) is O (e) in spite of the fact that both algorithms 
involve extraction of multiple roots which should decrease the accuracy to O (^ £ ^ where d is the order of the 
pole. This phenomenon can be explained by the effect of averaging the clustered roots (see [4] Proposition 
V.3]). 

6.B.4. Dependence on the model order. Next, we checked the dependence of the reconstruction 
error on the model order D = f maxi=i „ k!(. The results are presented in Figure 1531 on page [T71 The 
reconstruction error for all the parameters grows exponentially in D for all the methods. 

6.B.5. Dependence on the node separation. Finally, we checked the dependence of the reconstruc- 
tion error on the distance between the two nodes |^2 — The results are presented in Figure \6A\ on page 
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(a) Least squares (b) Prony (c) ESPRIT 

Figure 6.4: Dependence of the reconstruction error on the node separation. 



IT51 For all the three methods, the results are consistent with 

|A&|,|Aa i)i |~|£2-£ 1 r D . 

6. C. Conclusions. In the numerical experiments we have investigated the "best possible local accuracy" 
via the least squares method, comparing it both with the theoretical results of Theorem 14.51 and with the 
performance of two "global" solution techniques, namely Prony and ESPRIT methods, for small perturbations 
(high SNR). Our results suggest that: 

1 . The numerical behavior of the solution in the case of small data perturbations indeed exhibits the 
patterns predicted by Theorem 14.51 in particular the qualitative dependence of the reconstruction 
error on the values of the parameters of the problem. 

2. The Prony solution method largely fails to separate the parameters which could be separated in 
theory. Furthermore, its performance actually degrades when the highest coefficient |oj j 4 _i| is 
increased. ESPRIT separates the parameters better than Prony, but is still worse than optimal. 

3. In terms of absolute reconstruction error, ESPRIT is better than Prony but still worse than the 
optimal LS. 

4. In terms of dependence of the reconstruction error on the model order and the node separation, both 
Prony and ESPRIT behave close to the predicted law, namely exponential increase in the order and 
polynomial increase in the separation distance. 

7. Discussion. We believe that the analytically approach of this paper has the potential to provide 
relatively complete answer to several important questions related to stable solution of Prony-type systems, 
as briefly discussed below. 

The numerical experiments suggest that the least squares method approximates the optimal "local" 
behavior very well. However, it is well-known that a very accurate initial approximation is required in order 
to find the global minima. It is customary to use one of the global solution methods to obtain such an initial 
value. Further analysis of the Prony sets Mr,s may provide explicit conditions for such an initialization to 
be sufficiently close to the true solution. 

The general case S > R should be well-understood in order to estimate the feasibility of taking more 
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measurements than strictly needed (oversampling). Without assumptions on the noise, it is not a-priori 
obvious that averaging should improve the accuracy in any way. Again, such an understanding is hopefully 
achievable via the investigation of M.r,s with S 3> R. 

In practice it is often the case that neither the number of nodes K, nor the numbers {^} are known a-priori, 
but only their upper bounds. In this case, given a noisy measurement vector, more than one "explanation" is 
possible for this data, in which case a good reconstruction algorithm needs somehow to select the "optimal" 
configuration. One possible way to achieve this goal is to characterize, for each configuration of the system 
(i.e. |/C, {£i}JL]l)) the "stable regions" of the corresponding measurement sets M.r,Si f° r which the accuracy 
function ACC does not exceed a predefined upper bound. Based on the initial measurement y € C s and the 
error bound e, an algorithm would choose the closest "stable measurement set", i.e. select a configuration 
for which the local accuracy is optimal. Using this approach, collision of two nodes £j, £j can in principle be 
handled in a stable way by substituting the configuration |/C, {Zjj-J.-L j with {K, — 1, . . . , li + lj, . . . , Ik}} 
once the measurement vector leaves the stability region associated with the former configuration. In this 
regard, we note that such a singular behavior has been studied in [35| (see also [53]), where it is shown that if 
the solution is represented in the basis of divided differences, then the inverse operator is uniformly bounded 
with respect to the corresponding expansion coefficients. Analogous developments for extraction of multiple 
roots of polynomials [33] might be very relevant as well. 

In order to achieve the above goals, we propose to compute the function ACC as accurately as possi- 
ble. For that purpose, more detailed analysis of the Prony map@ is necessary. In particular, its essential 
nonlinearity should be quantified using the second-order terms in the Taylor expansion. 

In addition to (jl.5[) . other generalizations of the basic Prony system (jl.ip appear in applications. One 
such extension arises in Eckhoff's method for reconstructing piecewise smooth functions from Fourier 
coefficients. There, an additional parameter appears: namely, the measurements mk are given starting from 
some large index k = M. In we have presented an algorithm for solving this system with high accuracy 
(in the sense of asymptotic rate of convergence as M — > oo.) However, the question of "maximal possible 
accuracy" for this problem is still open. It will be most desirable to reinterpret those results in the sense of 
global stability bounds for Prony-like systems. 
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